Skip to content

fix: flatten form fields for fillable PDFs without appearance streams (#240)#242

Merged
rstrahan merged 1 commit intodevelopfrom
fix/fillable-pdf
Mar 17, 2026
Merged

fix: flatten form fields for fillable PDFs without appearance streams (#240)#242
rstrahan merged 1 commit intodevelopfrom
fix/fillable-pdf

Conversation

@rstrahan
Copy link
Copy Markdown
Contributor

Summary

Follow-up to PR #241. The initial fix (init_forms() only) was insufficient — fillable PDFs that lack pre-generated appearance streams still rendered without form field values.

Root Cause (Refined)

init_forms() initializes PDFium's form rendering engine, but render(may_draw_forms=True) can only draw form fields that have appearance streams. Many fillable PDFs — especially government forms like VA Form 21-22a — don't include appearance streams for their form fields. They rely on the PDF viewer to generate them at runtime.

Fix

Added page.flatten() before each page.render() call:

  1. init_forms() — initializes the form engine (already in develop from PR fix: initialize form rendering for fillable PDFs (#240) #241)
  2. page.flatten()NEW: forces PDFium to generate appearance streams and merge form field content into the page, making it visible to render()

Changes

  • lib/idp_common_pkg/idp_common/ocr/service.py — Added page.flatten() in the rendering loop before _extract_page_image()
  • patterns/unified/src/bda_processresults_function/index.py — Added page.flatten() before page.render()
  • lib/idp_common_pkg/tests/unit/ocr/test_ocr_service.py — Updated regression test to verify both init_forms() and flatten()
  • CHANGELOG.md — Updated fix description

Risk Assessment

  • flatten() is a no-op for pages without form fields (returns immediately)
  • Only modifies the in-memory PDF object — original file is never changed
  • All 49 OCR tests pass, lint clean

…eams (#240)

init_forms() alone is insufficient for fillable PDFs that lack pre-generated
appearance streams for form fields (common in government forms like VA-21-22a).
page.flatten() forces PDFium to generate appearances and merge them into page
content before rendering, ensuring all form field values are visible.

Changes:
- ocr/service.py: add page.flatten() before _extract_page_image() in rendering loop
- bda_processresults_function/index.py: add page.flatten() before render()
- test_ocr_service.py: verify both init_forms() and flatten() are called
- CHANGELOG.md: update fix description with two-part explanation
@rstrahan rstrahan merged commit c11b181 into develop Mar 17, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant